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I. PROLOGUE 

Let us consider an arbitrary text written by means 
of a 16-letter alphabet, say: a, b, c, . . . , n, o, p. Let us 
regroup as large part of the text as possible in quadruples 
belonging to the set Q = {aeim, afim, agim, . . . , dhlm, 
dhln, dhlo, dhlp}, and formed by strings obtained by 
picking out a single letter from a row of the matrix 

a b c d " 



_ m n o p _ 

when one moves downwards starting from the first row. 
Now let us define the functions F and G by F(a) = 
F(d) = F(e) = F(h) = F(i) = F(l) = F(m) = F(p) = 
+1, F(b) = F(c) = F(f) = F(g) = F(j) = F(k) = 
F(n) = F(o) = -1, G{xxx 2 x 5 x A ) = F(x x ) + F(x 2 ) + 
F(x3) — F(xi). On each four-character string of the re- 
grouped part of the text we evaluate the value of G and 
take its average value (G). 

The above awkward-looking manipulation with the 
text is an example of a procedure one might find in a 
paper on quantitative linguistics or semantic analysis. 
The analysis reveals certain correlational or contextual 
aspects of the text, the role of the contextuality measure 
being played by the average (G). 

To see what kind of a correlation one can capture, let 
us parametrize the alphabet by primed and unprimed 
bits 0, 1, 0', 1': 

a = (00), b = (01), c = (10), d = (11), 

c = (00'), f = (01'), g = (10'), h = (11'), 

i = (O'O), j = (O'l), k = (l'O), 1 = (l'l), 

m = (0'0'), n = (O'l'), o = (l'O'), p = (l'l'). 

After the reparametrization the regrouped text might 
represent data of an experiment testing the Bell inequal- 
ity and the function F represents values of the Bell 
observable for a single pair of measurements. And vice 
versa, any result of an experiment that tests the Bell in- 
equality can be represented as a text written in a 16-letter 
alphabet. 

The result of the form | (G) | > 2 reveals a nonclassical 
probabilistic structure behind the text. This structure is, 



of course, typical of the source of the text, since the text 
itself may be a simple collection of characters on a com- 
puter printout. Actually, we can immediately identify the 
nonclassical elements disclosed by |(G)| > 2: The bits 
and 0' (or 1 and 1') correspond to nonorthogonal vectors, 
and ordered pairs such as (01) are represented by tensor 
products. The possibility of hiding information behind 
nonorthogonal bases is the key idea of quantum cryptog- 
raphy and tensor representations of conjuctions are 
fundamental to quantum information theory (QIT). The 
observation of Bell that correlations between symbols in 
"texts" may reveal the presence of nonorthogonal bases 
is perhaps the most ingenious ingredient of his famous 
paper [J. 

The idea that some sort of mathematical manipulation 
with texts, or some apparently artificial mathematical 
representation of them, may reveal deep structures such 
as similarity of meaning or other nontrivial correlations, 
is at the roots of semantic analysis (SA). Still another 
field where analogies with the Bell inequality example 
are particularly striking is related to neural-network dis- 
tributed representations of concepts 0] . The links of such 
scientific disciplines with quantum mechanics, and QIT 
in particular, are almost unexplored as yet. The present 
paper is an attempt of filling up the gap p|. 

II. VECTOR MODELS OF TEXTS 

Modern approaches to SA typically model words and 
their meanings by vectors from finite-dimensional vec- 
tor spaces. The prominent examples of such approaches 
are Latent Semantic Analysis (LSA) [f| Q , Hyperspace 
Analogue to Language (HAL) ||, Probabilistic Latent 
Semantic Analysis (pLSA) 0, Latent Dirichlet Alloca- 
tion Hol. T opic Model [ll|, or Word Association Space 
(WAS) |l2j|. In the present Letter we concentrate on a 
simplified version of LSA, but we believe the discussion 
we present can be applied to all vector models of language 
and concept representation. 

SA is typically based on text co-occurence matrices and 
data-analysis technique employing singular value decom- 
position (SVD). Various models of SA provide powerful 
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methods of determining similarity of meaning of words 
and passages by analysis of large text corpora. The pro- 
cedures are fully automatic and allow to analyze texts 
by computers without an involvmcnt of any human un- 
derstanding. For example, what makes LSA quite im- 
pressive comes from the experiments with simulation of 
human performance. LSA-programmed machines were 
able to pass multiple-choice exams such as Test of En- 
glish as a Foreign Language (TOEFL) (after training on 
general English) |13| or, after learning from an introduc- 
tory psychology textbook, a final exam for psychology 
students 0. 

These and other achievements of LSA raise the ques- 
tion of its relevance for the problem of brain functioning 
and AI |14j . However, an element we found particularly 
intriguing and which is the main topic of our paper, is in 
similarities between LSA and formal structures of QIT. 

LSA is essentially a Hilbert space formalism. One rep- 
resents words by vectors spanning a finite-dimensional 
space and text passages are represented by linear combi- 
nations of such words, with appropriate weights related 
to frequency of occurence of the words in the text. Sim- 
ilarity of meaning is represented by scalar products be- 
tween certain word-vectors (beloging to the so-called se- 
mantic space). 

In QIT, words, also treated as vectors, are being pro- 
cessed by quantum algorithms or encoded/decoded by 
means of quantum cryptographic protocols. Although 
one starts to think of quantum programming languages 
[THl ITfil IrH ], the semantic issues of quantum texts are 
difficult to formulate. LSA is in this context a natural 
candidate as a starting point for "quantum linguistics" . 

Still, LSA has certain conceptual problems of its own. 
As stressed by many authors, the greatest difficulty of 
LSA is that it treats a text passage as a "bag of words" , 
a set where order is irrelevant [l8j- The difficulty is a 
serious one since it is intuitively clear that syntax is im- 
portant for evaluation of text meaning. The sentences 
"Mary hit John" and "John hit Mary" cannot be distin- 
guished by LSA; "Mary did hit John" and "John did not 
hit Mary" have practically identical LSA representations 
because "not" is in LSA a very short vector [Tj] ■ What 
LSA can capture is that the sentences are about violence. 

We think that experience from QIT may prove useful 
here. A basic object in QIT is not a word but a letter. 
Typically one works with the binary alphabet consisting 
of and 1 and qubits. Ordering of qubits is obtained by 
means of the tensor product. Ordering of words can be 
obtained in the same way, but before we proceed with 
QIT formalism, let us explain the standard LSA and for- 
mulate it in quantum mechanical notation. 



woodchuck could chuck wood? (S2) Woodchuck would 
chuck as much wood as a woodchuck could chuck if 
a woodchuck could chuck wood. (S3) Could wood- 
chuck chuck 35 cubic feet of dirt? (S4) If a woodchuck 
could chuck wood woodchuck would chuck 700 pounds of 
wood." 



The LSA matrix representation 
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It is usual to pre-process Aq by multiplying each entry 
by a function associated with the entropy of an appro- 
priate word evaluated on the basis of an entire text. The 
question of what kind of a co-occurence matrix should 
one relate to a text is actually an open one, and is inves- 
tigated in various alterantives to LSA (HAL, WAS, Topic 
Model). For simplicity we skip this point. 



III. SEMANTIC ANALYSIS: 
ILLUSTRATION 



AN 



Let us consider the following passage: 

"(si) How much wood would a woodchuck chuck if a 



The text corresponds now to the map A : R A — > R 16 , 
whose SVD (up to numerical roundup errors) is Aq = 
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WDqV where 



V = 
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The essential step of LSA is the reduction 
A a = U f D a V t— ► A\ = V X B X V 



(3) 



(4) 



(5) 



where D\ = PDq and P is a projector commuting with 
Dq. For example, if 
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may allow a computer to pass TOEFL not worse than 
an average non-native speaker who wants to study in the 
USA, and refer the reader to publications on LSA. For 
our purposes it is sufficient to know that the rows of 
A\ are termed the word-vectors and the space of word- 
vectors is known as the semantic space. Cosines between 
two word-vectors (or just their scalar products) are mea- 
suring a semantic distance (similarity of meaning) be- 
tween words within a given set of text corpora repre- 
sented by A. What is important, SVD can make some 
entries of A\ negative and even make some scalar prod- 
ucts negative, the latter occuring for antonyms. The co- 
efficients of word- vectors lose, after SVD, the simple link 
to frequncies of occurences of words. 

Of course, the dimensions appearing in real texts in- 
vestigated by means of LSA are much greater (for ex- 
ample 30473 columns and 60768 rows in the experiment 
discussed in 01). Experience shows that the analysis is 
most efficient if the projector P projects on a subspace 
of dimension around 300, but what is the meaning of this 
dimension is yet a subject of speculations |19| . 



IV. SEMANTIC ANALYSIS IN QUANTUM 
NOTATION 



In our example the matrix {/t is not square but its 
columns are mutually orthogonal. Taking any 12 or- 
thonormal vectors that are, in addition, orthogonal to 
the columns of W we can replace W by a 16 x 16 uni- 
tary matrix W whose first 4 columns coincide with those 
of W , and end up with SVD of the form 



A k = (A k ,0) = W 







V 
V 1 



= &D k V, 



k = 0, 1, where all the matrices are square and V 1 - is an 
arbitrary unitary matrix of appropriate dimension. The 
map A k i— > A k neither adds nor removes any information 
from the text; its only objective is to work with text ma- 
trices and their SVDs that may be regarded as operators 
mapping certain Hilbert space TL into itself. 

The Hilbert space TL is finite dimensional, but in prin- 
ciple one cannot impose any limitation on the number 
of words or sentences one wants to take into account. 
It is therefore natural to treat all the concrete examples 
as subspaces of an infinite dimensional Hilbert space of 
all the possible words. Whether sentences or other text 
units are regarded as collections of words or as new words 
is a matter of convention. Assume each word of a vocab- 
ulary is represented by a basis vector \n), where n is a 
natural number. The text matrix (A = Aq or A = A{) 
corresponds to the operator A = ^ mn A rnn \m){n\. The 
column representing a nth sentence is given by the (un- 
normalized) vector 



We will not go very deeply into details of how and why 
a reduced representation, of the type illustrated by A\, 



\s n ) = A\n) = A mn \m). 

m 



(8) 
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For example, the sentence S2 is in LSA represented by 
the sentence- vector 



*2 



12} + |4) 



2 |3) + |5) 



3 |6) + |7) 



After SVD the coefficients of a sentence- vector are typi- 
cally neither natural nor positive. Let us note that |sz) 
is not a word-vector in the sense of LSA, but a sentence- 
vector: Word-vectors are the rows of the text matrix. 
The rows are obtained from A by (w m \ = (m\A. The 
similarity of meaning of, say, "how" and "much" is given 
by cos(how, much) = (wi|w 2 )/( || Wi || • || w 2 || )• (Re- 
call that LSA gives optimal characterization of meaning 
if one calculates the scalar product after the reduction 
Dq i ► D\ = PDo with appropriately chosen P; in the ex- 
ample, before reduction cos(how, much) = 0.707107 and 
after the reduction cos (how, much) = 0.999985). 

Putting this differently, the word-vectors characteris- 
tic of a text represented by the operator A are given by 
\w m ) = A^lm). The matrix representing similarities of 
meaning between all the possible pairs of words corre- 
sponding to the text A is thus given by 



cos(mth word, nth word) 



(m\AA^\n) 



m\AA^\m}J(n\AA^\n} 



As we can see, the entire information about mutual re- 
lations between words is in LSA encoded in the operator 
p = AAl. Taking into account JSJ and the resolution of 
unity 1 = J2 n l rl )( rl l we can wr it e 



\n)(n\A i = 



(Sr. 



(9) 



with p s n — (s n \s n ) and (a n \a n ) = 1. Since in any prac- 
tical application the number of words is finite, the sum 
in © is finite as well and Trp =|| A ||hs = E n < °°> 
where A„ are eigenvalues of N = A' A, and || • ||hs is the 
Hilbert-Schmidt norm. For this reason p is formally an 
unnormalized density matrix of the set of sentences. 

The operator N plays an essential role in LSA. To see 
this let us look at the explicit proof of SVD formulated 
in the quantum notation (physicists will recognize here 
the so-called Schmidt decomposition). Let |A„) be a nor- 
malized eigenvector of N, i.e. N\X n ) = A„|A„). Denoting 
\a n ) = A\X n ) we compute 



A = 



E 

\a n )jiQ 

\otn) 
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An(A r , 



= Ei&>wEv / wiEi m >< A ™i ( 10 ) 
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where \(3k 



\ a k)/ II oik || if Afc > 0, or any other ba- 



sis vector from the subspace corresponding to A& = 0, if 



Afc = 0. It is clear that the singular values in SVD are 
given by y/Xk~. The LSA procedure is essentially equiva- 
lent to the spectral analysis of N. 

Let us finally note that N can be written as 

n = A*J2 \ n )( n \A = E K>W«I = E Pn K> (<"« I > 



with — (w n \w n ) and (uj n \uj n ) = 1, i.e. as an unnor- 
malized density matrix representing a mixture of word- 
vectors. 



V. SUPERSYMMETRY AND DIMENSIONAL 
REDUCTIONS 

The duality between sentence-vectors and word- 
vectors whose one of the manifestations is the link AA^ <-> 
A^ A is well known from supersymmetric theories |20|. In 
supersymmctric terminology operators AA' and A' A are 
known as superpartners. 

The dimensional reduction employed in LSA is per- 
formed on the spectrum of N. Since one eliminates in 
this way small eigenvalues, the procedure is analogous to 
some sort of purification of word- vector density matrices. 
But we know that one of the standard results of super- 
symmetric quantum mechanics states that N and p are 
isospectral. The interchange of N and p is equivalent to 
replacing word- vectors by sentence- vectors. Dimensional 
reduction can be thus performed for both TV and p, in 
the latter case the reduction deals with sentence-vector 
density matrices. Finally, one can combine the two ap- 
proaches. A "supersymmctric LSA" can be based on su- 
percharges Q = ( Jj'j ^ J and the two density matrices 

taken simultaneously in H = Q 2 = p © N . 

In addition to the above dimensional reductions, two 
additional reductions are very natural from the viewpoint 
of our quantum interpretation. Let us note that in ad- 
dition to the spectrum {A„}, we have two sets of "mix- 
ing parameters": {p s n } and {p™}- The relations between 
them are the following 

p™ = (w n \w n ) = (n\AA f \n) = p nn , (11) 
P s n = (s„\sn) = (n\A*A\n) = N nn . (12) 

Elimination of small diagonal elements p nn or N nn is not 
equivalent to eliminating small eigenvalues of iV or p. 
However, after this type of "purification" the resulting 
operators p and N are still positive and, hence, can be 
factorized as p = BB\ N — C^C, leading effectively to 
two new types of reduction: in B and A i— > C, 



VI. FOCK SPACE OF WORDS 

As we have seen, LSA can be formulated as a Hilbert 
space problem. The "bag of words" analysis is performed 
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in H. Ordered sequences of words can, in principle, be 
constructed in exact analogy to ordered sequences of let- 
ters in QIT. Still, there is a subtlety we want to point 
out. 

Consider a phrase, i.e. an ordered n-tuple of words, 
(wordi, . . . , wordn). Quantum physicist's intuition tells 
us that the natural representation of the sentence is a 
tensor product of vectors representing the words. The 
difficulty is this: Which vectors should one choose? The 
mutually orthogonal basis vectors . . . , \j n ), or rather 
the associated word-vectors \wi) = A^\ji) , . . . ,\w n ) — 

Whatever representation one chooses, the phrase 
(ni, . . . , uk) will be mapped into 

K 

|n x . . . n K ) = \m) <g> • • • <g> \n K ) EH®---®H = H® K '. 

Including the empty word we arrive at the Fock space of 
all the text passages H F = ®k=qH® K ■ 

LSA is performed in Tip in exactly the same way as in 
TL. The structures one can investigate are much richer. 
Taking as an example G. Stein's phrase "Rose is a rose 
is a rose is a rose" , not only can we work with 

|si) = 4|rose) +3|is) +3|a) e H (13) 

but also with vectors revealing the syntactic structures, 
for example, 

\s 2 ) = |rose) 3|is) ® |a) |rose) EH® H® 3 C Hf, 
\s 3 ) = (|rosc)+3|is))®3|a)0|rose)GH®H® 2 c7i: F . 

The above formulas show a typical feature of Fock spaces, 
namely superpositions of vectors belonging to different 
tensor powers. It is very interesting that similar con- 
structions are encountered in convolution-based memory 
models, such as TODAM [2l] or Holographic Reduced 
Representations (HRRs) 4]. 

VII. RELATION TO SMOLENSKY'S TENSOR 
PRODUCT BINDING 

Smolensky in '2^ proposed tensor products of vectors 
as a means of solving the so-called binding problem: How 
to keep track of which features belong to which objects in 
a formal connectionist model of coding? In the linguistic 
context of SA the binding problem is equivalent to the 
problem of representing syntax. Links to quantum struc- 
tures are particularly striking here, but there are also 
intriguing logical differences with what one would expect 
from a QIT perspective. 

First, one represents an activity state of a network by a 
vector, and this is very close to what a quantum physicist 
would do. In comments to his Definition 2.1 Smolensky 
stresses that the vectors are always written in the same 
and fixed basis. So formally we do not really need vec- 
tors, but n-tuples of numbers are enough. This is against 



the philosophy of QIT where states are indeed vectors 
and the same information may be encoded in non-parallel 
vectors. 

The fact that preferred basis is used becomes even 
more important in models such as TODAM or HRRs 
where the tensor product is replaced by its "compressed 
form" : convolution or circular convolution. Both oper- 
ations are defined on rt-tuples and not on vectors. Still, 
one can argue that in quantum measurement theory we 
do indeed deal with preferred pointer bases |2^| and the 
models such as HRRs may refer to this level of analysis. 

A predicate p(a,b), such as eat(John,f ish), is rep- 
resented by the vector r-y <£> a + r 2 <S> b where the vectors 
rk represent roles and a, b are fillers. A predicate is, ac- 
cordingly, given by an entangled activity state. A person 
trained on QIT would expect the vector to mean "role r\ 
AND filler a, OR role r 2 AND filler b" . Of course, the 
intention of Smolensky was different: The sum is meant 
to represent the conjuction (AND) and not the alter- 
native (OR). This feature is also characteristic of other 
neural-network models. Why is it so and is this type of 
representation crucial for symbolic AI? 

The above similarities and differences show that fur- 
ther exploration of possible implications of connectionist 
models for QIT, and vice versa, may be worth of further 
studies. We will not pursue these matters further here. 



VIII. EFFICIENCY OF TENSOR 
REPRESENTATIONS 

Tensor products are more "economic" than Cartesian 
powers due to the identifications of the type Cg> 
\4>) = ® (a#)) = a(\ip) <g> \cf>)) that do not hold in 
Cartesian products. Thus the Fock space automatically 
performs a kind of dimensional reduction, which is the 
main idea of both LSA and distributed representations. 

If we are more interested in the issue of binding than 
in ordering of words then further compression of infor- 
mation is possible if one employs symmetric (bosonic) or 
antisymmetric (fermionic) Fock spaces. Symmetric ten- 
sor powers are closer to convolutions employed in HRRs 
but, unlike convolutions, are defined on vectors and not 
n-tuples of numbers. 

Let us also note that in binary (or qubinary) repre- 
sentations all tensor powers can be decomposed into ir- 
reducible components, exactly in the same way it is per- 
formed in 2-spinor calculus [24( . It is known that any irre- 
ducible representation corresponds to symmetric spinors 
and any antisymmetric spinor is a scalar times the singlet 
(all antisymmetric two-index spinors are proportional to 
one another) . So it is very natural indeed to employ rep- 
resentations based on symmetric operations as the main 
building blocks of, say, memory models (convolution used 
in HRRs is also commutative). 

All these links are interesting from the point of view of 
the discussions between Penrose and proponents of clas- 
sical AI [25j. If brain is a quantum device, as suggested 
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in [26( or, which is a weaker condition, if the concep- 
tual part of the mind entails a formal quantum structure 
[23, , then the presence of tensor structures in SA or 
AI will not be accidental. 

The question of tensor representations of semantic as- 
pects of texts in principle can be settled experimentally. 
Document retrieval experiments based on quantum logic 
were already performed 29] and the results are encour- 
ageing. 

Let us finally make the remark that some authors stress 
(cf. |30|') that semantic categorizations cannot be mod- 
elled by a set logic. Experiments were reported where, for 
instance, people were willing to accept that chairs are a 
type of furniture and that carseats are a type of chair, but 
would then deny that carseats are a type of furniture (for 
a review cf. |3l|). Trying to model the meanings of 'fur- 
niture', 'chair', 'carseat' by means of set-theoretical con- 
structions one arrives at contradiction with the inequality 



P(A ABAC) < P(A A C) (cf. also 32]). In QIT this 
type of contradiction is at the roots of the Bell inequal- 
ity violation, whose proof is based on set-theoretic con- 
structions while QIT employs tensor structures in Hilbcrt 
spaces. Similarly, tests of tensor structures via SA may 
play an analogous role in AI, quantitative linguistics, or 
experimental psychology, as the Bell inequality did for 
hidden- variables theories. 
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